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Recent  developments  using  directed  acyclicaJ  graphs  (i.e.,  influence 
diagrams  and  Hayf'sian  networks)  for  knowledge  represeni  at  ion  have  lessened  I  lie 
problems  of  using  probability  in  knowledge-based  systems  (KBS).  Most  current 
research  involves  the  efficient  propagation  of  new  evidence,  but  little  has  been 
done  concerning  the  maintenance  of  domain-speci f i e  knowledge,  which  includes 
the  probabilistic  information  about  the  problem  domain.  By  making  use  of  con¬ 
ditional  i  nilopendenc  i  es  repiesentcd  in  the  graphs,  however,  probability  assess¬ 
ments  are  required  only  for  certain  variables  when  the  knowledge  base  is  updated 

The  purpose  of  this  study  was  to  investigate,  for  those  variables  which 
require  probability  assessments,  ways  to  reduce  the  amount  of  new  knowledge 
required  from  the  expert  when  updating  probabilistic  information  in  a  probabil¬ 
istic  knowledge-based  system.  Three  special  cases  (ignored  outcome,  split 
outcome,  and  assumf'd  constant  outcome)  were  identified  under  which  many  of  the 
original  pt olmb i 1 i I i es  (those  already  in  the  knowledge-base)  do  not  need  to  be 
reassessed  when  maintenance  is  required. 

Although  some  reduction  in  the  number  of  probability  assessments  can  he 
achieved  when  the  special  cases  apply,  it  appears  other  areas  may  be  more  produr 
live  in  reduc  ing  the  level  of  effort  needed  to  maintain  probabilistic  KBS’s. 
lopics  recommended  for  future  research  include  the  development  of  efficient 
propagation  tcchnii)ues  for  multiply  connected  graphs,  and  investigation  of 
methods  to  make  the  probability  encoding  process  more  efficient. 
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gorating  and  a  constant  source  of  new  ideas.  My  otlier  committee  members.  Dr.  Frank 
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Abfilract 

Heccnt  dcvelojMiiPiils  using  directed  acyclical  graphs  (i.e.,  influence  diagrams  and 
Bayesian  networks)  for  knowledge  representation  have  lessened  the  problems  of  using 
probahilily  in  knowledge- based  systems  (KBS).  Most  current  research  involves  the 
efficient  propagation  of  new  evidence,  but  little  has  been  done  concerning  the  mainte¬ 
nance  of  domain-specific  knowledge,  which  includes  the  probabilistic  information  about 
the  problem  domain.  By  making  use  of  conditional  independencies  represented  in  the 
graphs,  however,  probability  assessments  are  required  only  for  certain  variables  when 
the  knowledge  base  is  updated. 

'I'lie  purpose  of  this  study  was  to  investigate,  for  those  variables  which  require 
probability  assessments,  ways  to  reduce  the  amount  of  new  knowledge  required  from 
the  expert  when  updating  probabilistic  information  in  a  probabilistic  knowledge- based 
system.  'I'hree  special  cases  (ignored  outcome,  split  outcome,  and  assumed  constant 
outcome)  were  identified  under  which  many  of  the  original  probabilities  (those  already 
in  the  knowledge-base)  do  not  need  to  be  reassessed  when  maintenance  is  recjuired. 

Although  some  reduction  in  the  number  of  probability  assessments  can  be 
achieved  when  the  special  cases  apply,  it  apj)ears  other  areas  may  be  more  productive  in 
reducing  the  level  of  effort  needed  to  maintain  probabilistic  KBS's.  'I'opics  recom- 
men<led  for  future  research  include  the  development  of  efficient  propagation  technujues 
for  multij>ly  connected  graphs,  and  investigation  of  methods  to  make  the  prol>abibty 
encoding  process  mine  ellicient 
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MAINTENANCE  OF  PROBABILISTIC  KNOWLEDGE-BASED  SYSTEMS 


/.  Barkfiroutid 

R'.'cont  years  have  shown  a  growing  use  of  artificial  intelligence  (Al)  techniciiies,  in 
particular  that  of  knowledge-based  (expert)  systems  (IxBS),  as  an  aid  in  the  decision 
making  process.  One  of  the  current  areas  of  interest  in  Al  is  the  representation  of  uncer¬ 
tainty  because,  as  Lindley  stales,  “  .  .  .  we  want  to  study  uncertainty  ...  to  be  able  to 
make  decisions  in  the  face  of  uncertainty"  (G;13())  Due  to  dilTicullies  in  the  use  of  pro¬ 
babilities  in  knowledge-based  systems,  most  of  the  Al  systems  in  use  today  either  ignore 
uncertainty  in  the  problem  domain  or  use  non-probabilistic  approaches  for  representing 
uncertainty.  However,  recent  research  lias  been  somewhat  successful  in  reducing,  even 
eliminating,  some  of  these  difficulties  (1;  7). 

Concerns  About  Probability  in  hBS 

Why  h  as  probability  been  largely  ignored  in  most  of  the  knowledge-based  systems 
to  date':"  llenrion  (3:‘l),  Pearl  (7:2-12,252),  and  Kicli  (9:192-193)  discuss  reasons  why 
Ilaye.'iian  probabilities  have  not  been  used  in  most  know  ledge- based  systems  ('faille  1). 
While  Rich  only  lists  reasons  for  not  using  probaliilities,  llenrion  and  Rearl  present 
rebuttals  to  some  of  the  arguments  against  the  use  of  probaliilities  in  knowledge-based 
systems  The  arguments  against  using  probability  are  addressed  to  varying  degrees  in 
the  literature,  some  are  shown  not  to  be  inherent  problems  in  the  use  of  pri'lialnlily, 


while  others  are  still  open  (piestions. 


Table  1.  Why  Not  Probability  in  KBS? 


Reason 

Requires 

unrealistic  inde])endeiice 

as- 

1  sumi)tions 

Can  not 

handle  second  order  uncer- 

taint  V 

.Not  how 

people  do  it 

('omputa 

tionallv  intractable 

Inference 

process  hard  to  explain 

Requires 

vast  amounts  of  data  (col- 

lection  too  hard) 

Method 

used  for  representation 

of 

uncertain 

tv  does  not  matter 

Difficult 

to  modify  knowledge  base 

due  to  large  number  of  complex 

in* 

teractions 

1.  Vnrcalislic  Independence  Assumptions.  One  critical  distinction  must  be  noted 
when  reading  the  literature  regarding  the  issue  of  independence  assumptions:  for  proba¬ 
bilistic  knowledge-based  systems,  a  particular  implementation  may  make  independence 
assumptions  to  avoid  computational  complexity,  but  probability  theory  can  readily 
handle  dependencies  among  sources  of  evidence.  So,  while  these  arguments  may  be 
valid  for  some  specific  knowledge-based  systems  which  use  probability  (e.g.,  Prospector), 
they  do  not  hold  for  all  such  systems  (e  g  ,  ALTPRID)  (9:193;  1:161).  Chapter  II  shows 
how  knowledge-based  systems  based  on  influence  diagrams  can  update  belief  based  on 
dependent  sources  of  evidence. 

As  llenrion  shows  (3.6-8),  other  methods  of  representing  uncertainty  (eg.,  cer¬ 
tainty  factors  or  fuzzy  set  theory)  make  a.ssnmptions  about  dependence/  independence  of 
the  sources  of  evidence.  For  example,  the  and  and  or  fuzzy  set  operators  assume  the 
maximum  |)ossible  correlation  between  evidential  sources.  'I'lius  the  view  that  fuzzy  set 


theory  requires  no  assumptions  about  dependence  (14:78,79)  is  misleading  'I’here  is 
merely  no  terminology,  no  concept,  for  such  evidential  dependencies  in  fuzzy  set  theory 
(3:7). 

C  Inahthly  to  Represent  Second  Order  Uncertainty.  Another  of  the  arguments 
against  the  use  of  probability  in  knowledge-based  systems  is  that  there  is  no  mechanism 
for  dealing  with  second  order  uncertainly,  that  is,  uncertainty  about  the  probabilities. 
An  example  would  be  an  expert  who  was  unwilling  or  unable  to  provide  the  knowledge 
engineer  with  a  definite  probability  number:  “Well,  the  chances  of  A  hajipening  lie 
somewhere  around  .4.'),  but  could  range  anywhere  from  .3  to  .5.”  lienrion  (3:5)  states 
that  using  the  mean  value  to  estimate  the  probability  .  .  is  often  sufficient  .  .  .  unless 
decisions  about  gathering  new  information  are  being  contemplated,”  or  that  a  range  of 
probabilities  may  be  specified  to  represent  this  second  order  uncertainty. 

S.  Reople  Do  Not  Use  Probability  for  Reasoning.  The  critique  that  “probability 
should  not  be  used  to  represent  uncertainty  in  knowledge-based  systems  because  it  does 
not  reflect  the  way  people  reason”  lies  at  the  very  heart  of  the  disagreement  between 
proponents  of  alternate  uncertainty  representations,  such  as  fuzzy  set  theory,  and  pro¬ 
ponents  of  Bayesian  probability.  Close  examination  of  the  many  discussions  dealing 
with  the  various  representations  for  uncertainly  (5,  C,  12,  14)  reveals  that  the  primary 
basis  for  disagreement  over  which  representation  is  “best”  is  one  of  (ihilosophy  rather 
than  one  of  method,  'I'he  proponents  of  probability  adoj't  a  nonnatn'c  view  of  decision 
making,  saying  that  it  is  better  to  represent  uncertainly,  not  .os  people  usually  do,  but 
rather  as  they  should  do  if  they  desire  to  act  in  a  cruisisleni  and  logical  manner,  f’ro 
ponents  of  the  other  rcfiresentations  subscribe  to  the  descriptive  ap[>roach,  which  pro¬ 
motes  the  view  that  uncertainty  should  lie  re|>resenle(l  in  a  manner  com|)atible  to  the 


way  people  actually  represent  uncertainty.  This  dilTerence  of  philosopliy  is  clearly  the 
primary  issue  in  Zadch’s  rebuttal  (6:21)  to  Lindley’s  claim  that  .  .  the  only  satisfac¬ 
tory  description  of  uncertainty  is  probability”  (5:113).  It  is  quite  likely  that  this  basic, 
philosophical  issue  is  one  of  the  stumbling  blocks  in  the  universal  adoption  of  any  one 
method  for  representing  uncertainty  iti  knowledge- based  systems. 

Compulationalty  Inlraclable.  One  of  the  critical  complaints  about  using  proba¬ 
bility  in  k nowledge- based  systems  is  the  computational  requirements.  For  example,  the 
full  joint  distribution  for  n  variables,  each  with  two  possible  outcomes,  contains  2"  pro¬ 
babilities.  Keduciiig  this  exj)onential  complexity  has  been  the  primary  area  of  research 
in  the  use  of  probabilities  in  knowledge-based  systems.  Pearl  has  been  especially  active 
in  this  area,  applying  the  use  of  directed,  acyclical  graphs  (Bayesian  nets)  to  the  prob¬ 
lem.  Thus  far,  his  results  are  limited  to  problems  which  can  be  represented  in  singly 
connected  directed  graphs,  where  there  is  no  more  than  one  (undirected)  path  between 
any  two  nodes  (7:2‘19). 

Another  representation  which  may  somewhat  alleviate  this  problem  is  the  use  of 
influence  diagrams  in  impleinenting  a  probabilistic  inference  scheme  in  know  ledge- based 
systems  (1).  Originally  developed  as  a  tool  for  decision  analysis,  influence  diagrams 
have  been  used  as  an  integral  part  of  AFTIiRlD,  a  tool  for  building  know  ledge- based 
systems  which  explores  the  al)ility  to  combine  probabilistic  and  logical  inference  in  the 
same  system.  In  these  systems,  the  individual  evidences  and  hypotheses  (conclusions) 
are  represented  as  nodes  in  a  directed  graph.  The  arcs  between  the  nodes  represent 
dependencies  between  the  probabilities  for  those  nodes. 

The  principal  diflerence  between  inlliience  diagrams  and  Bayesian  nets  is  the  avai¬ 
lability  of  decision  and  value  nodes  in  the  influence  diagram,  d'liese  noefes  allow  the  use 


of  normative  decision  theory  (i.e.,  maximum  expected  utility)  in  the  decision  process, 
Shachter  (10:8-16)  gives  an  algorithm  which  can  be  used  for  probabilistic  inferencing. 
Chapter  II  briefly  describes  how  influence  diagrams  and  Shachter's  algorithm  may  be 
used  in  probabilistic  knowledge-based  systems. 

5.  Inference  Process  is  Hard  to  Explain.  In  order  to  effectively  use  knowledge- 
based  systems  in  the  decision  making  process,  it  is  necessary  to  gain  the  user's 
confidence  in  the  system  In  many  cases,  this  requires  that  the  user  understands  the 
process  by  which  the  recommended  action  is  obtained.  Henrion  gives  arguments  from 
Pearl  and  Spiegelhalter  showing  how  this  can  be  achieved  using  either  1)  the  logarithm 
of  the  likelihood  ratio  as  an  additive,  relative  measure  of  the  impact  new  evidence  has 
on  a  given  conclusion;  or  2)  by  stepping  through  the  inference  network  (i.e.,  influence 
diagram  or  Bayesian  network),  where  a  “simple,  intuitively  meaningful”  explanation 
can  be  given  at  each  step  (3:5).  If,  as  Pearl  argues,  people  reason  based  on  “low-order 
marginal  and  conditional  probabilities  defined  over  small  clusters  of  propositions,”  then 
in  most  cases  the  underlying  inference  network,  obtained  from  the  expert,  is  relatively 
sparse.  Thus  the  explanation  at  each  step  involves  relatively  few  propositions,  making 
for  a  more  intuitive  description. 

6.  Data  Requirements.  Henrion  attributes  the  mistaken  view  that  vast  amounts  of 
data  are  required  for  the  use  of  probabilities  to  a  frequency-based  interpretation  of  pro¬ 
bability.  This  “problem”  is  not  applicable  to  the  Bayesian  view  of  probability,  where 
probability  is  a  measure  of  an  individual  s  degree  of  belief,  not  necessarily  an  absolute 
measure  obtained  by  statistical  sampling  (3:5).  Henrion  also  claims  that  the  number  of 
probabilities  which  must  be  encoded  from  the  expert  will  not  be  inordinate  if  the  infer- 


ence  structure  reflects  the  way  the  expert  thinks  about  the  problem,  a  view  shared  by 
Shachter  and  Heckerman  (11:56). 

This,  however,  does  not  imply  that  it  is  a  simple  matter  to  encode  probabilities — 
on  the  contrary,  decision  analysts  (4:30-10)  find  this  process  to  be  as  time  consuming 
and  as  full  of  pitfalls  as  any  part  of  the  knowledge  engineering  process  for  rule-base  sys¬ 
tems  It  is  important  to  note  that  the  difficulty  does  not  lie  in  obtaining  a  number  or 
even  a  set  of  numbers.  The  hard  part  is  obtaining  the  correct  numbers:  those  that 
form  a  coherent  set,  relative  to  the  laws  of  probability.  It  is  in  this  requirement  for 
coherence  that  probability  differs  most  drastically  from  other  methods  of  representing 
uncertainly. 

7.  Method  Used  to  Represent  Uncertainty  Does  Not  Matter.  Of  critical  importance 
is  the  complaint  that  the  method  chosen  for  the  representation  of  uncertainty,  be  it  cer¬ 
tainty  factors,  fuzzy  set  theory,  or  probability,  really  does  not  have  much  of  an  effect 
on  the  final  action  recommended  by  an  expert  system.  Ilenrion  briefly  reviews  two  stu¬ 
dies  done  on  the  MYCIN  system  (3:13-14)  When  the  standard  MYCIN  certainty  factor 
method  for  combining  evidence  was  used,  one-fourth  of  the  cases  indicated  that  certain 
evidence  weakened  the  chance  of  a  given  hypothesis,  when,  in  fact,  that  outcome  was 
more  likely,  given  the  evidence!  The  other  study  examined  granularity  of  the  certainty 
factors,  where  changing  from  a  continuous  scale  to  a  five-point  scale  (-1,  -.5,  0,  .5,  1) 
caused  9  out  of  10  organisms  to  be  rnisidentified.  Although  these  studies  do  not  fully 
answer  the  question  of  whether  the  choice  of  uncertainty  representation  makes  a 
difference  in  the  final  result,  they  certainly  raise  questions  about  the  common  belief  that 
it  makes  no  difference.  Ilenrion  notes  the  lack  of  information  in  this  area,  and  calls  for 
further  study. 
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8.  Difficult  to  Modify  the  Knowledge  Base.  One  difTicuIty  of  probabilistic 
knowledge-based  systems  that  has  received  relatively  little  attention  in  the  literature  is 
that  of  maintaining  the  domain-specific  knowledge  base,  which  includes  all  of  the  pro¬ 
babilistic  information.  The  use  of  probability  to  represent  uncertainty  does  not  mean 
that  once  such  a  know  ledge- based  system  is  built  it  w  ill  never  need  to  be  changed.  As 
Waterman  so  eloquently  states,  “Expert  systems  .  .  .  will  make  mistakes”  (13:30).  Just 
as  rule-b;ised  systems  must  be  maintained  throughout  their  use,  probabilistic 
knowledge-based  systems  are  subject  to  dianges  in  the  knowledge  base.  This  mainte¬ 
nance  effort  may  be  caused  by  changes  in  the  underlying  problem  domain,  an  incorrect 
designation  of  the  outcome  space,  or  even  an  error  in  the  initial  encoding  of  the  proba¬ 
bilities 

lleckerman  and  Horvitz  (2:125)  briefly  address  this  concern.  They  argue  that  the 
steps  necessary  in  maintaining  the  underlying  inference  structure  (Bayesian  network) 
include:  1)  reassessment  of  the  dependency  structure  by  the  expert,  where  changes  in 
the  arcs,  node  outcomes,  or  number  of  nodes  are  noted;  and  2)  reassessment  of  the  pro¬ 
bability  distribution  for  each  node  whose  incoming  arcs  changed.  This  holds  for  either 
adding  or  deleting  a  node.  Little  more  than  this  brief  discussion  of  probabilistic 
knowledge-based  system  maintenance  is  reported  in  the  literature. 

As  indicated  in  this  brief  overview,  research  is  relieving  some  of  the  concerns  about 
using  probability  in  knowledge-based  systems.  The  work  of  Breese,  Pearl,  Shachtcr, 
and  others  is  leading  to  more  elficient  methods  of  uj)dating  probabilities  in  knowledge- 
based  systems  through  the  use  of  Bayesian  networks  or  influence  diagrams.  However, 
one  key  area  that  has  not  been  adequately  addressed  is  the  maintenance  of  the  underly¬ 
ing  inference  network  in  such  knowledge  bases. 


Specijic  Problem 


Tlierc  has  been  little  study  of  niethods  which  can  be  used  to  ininiinize  the  impact 
of  modifying  the  domain-specific  knowledge  base  of  a  probabilistic  KBS.  This  process, 
which  involves  obtaining  further  information  from  the  expert(s)  through  knowledge 
engineering,  can  be  very  time-intensive.  As  more  efficient  methods  for  using  probabili¬ 
ties  in  knowledge-based  systems  become  available  through  the  use  of  the  new  represen¬ 
tation  schemes,  and  as  these  systems  are  used  in  domains  which  may  require  more 
changes  to  the  underlying  knowledge  base,  reductions  in  the  level  of  effort  required  to 
obtain  new  information  become  more  critical. 

Rcsenrrh  Question 

For  know  ledge- based  systems  which  use  probability  to  represent  uncertainty,  how 
can  changes  to  probabilistic  information  in  the  domain-specific  knowledge  bases  be 
managed  to  reduce  the  level  of  effort  required  in  updating  the  system? 

Subsidiary  Questions 

This  re.search  question  can  be  partitioned  into  the  following  three  subsidiary  ques¬ 
tions  'I'he  information  necessary  to  answer  the  research  question  is  provided  by  answer¬ 
ing  all  of  these  questions. 

What  changes  can  occur  in  probabilistic  representations,  and  what  are  the 
corresiionding  changes  to  the  knowledge  representation  of  a  probabilistic  knowledge- 
based  system? 

W'hich  types  of  changes  may  lend  themselves  to  a  reduction  in  the  effort  required 
to  encode  new  probabilities,  and  how  much  reduction  may  be  achieved’ 


For  knowledge  system  tools  which  use  influence  diagrams  to  represent  probabilistic 
knowledge,  such  as  ALTERID,  how  can  the  knowledge  system  builder  make  use  of  the 
effort-saving  cases  identified  in  the  second  subsidiary  question? 

Scope 

This  investigation  uses  the  following  assumptions  as  a  starting  point  in  the 
research  effort.  These  underlying  assumptions  are  critical  to  this  research  effort. 

1.  The  use  of  probabilities  to  represent  uncertainty  in  knowledge  systems  is 
desirable — or,  in  some  cases,  necessary — in  order  to  adequately  model  the 
uncertainties  present  in  the  problem  domain  and  enable  normative  analysis. 
Additionally,  these  probabilities  can  be  encoded  from  the  expert’s  beliefs,  as 
a  part  of  the  knowledge  engineering  process,  but  this  research  effort  does  not 
examine  methods  for  encoding  these  probabilities.  This  probabilistic  infor¬ 
mation  is  based  on  a  specific  stale  of  information,  that  is,  all  of  the 
knowledge  which  the  expert  used  in  determining  the  probabilistic  informa¬ 
tion.  This  Bayesian  approach  to  probabilities  is  consistent  with  much  of  the 
current  research  (1;  7;  3,  10). 

2.  C’hanges  in  the  domain-specific  knowledge  reflect  change.s  in  the  state  of 
information  available  about  the  problem  domain  For  example,  a  test  ls 
developed,  after  the  initial  pr(>l>abilitirs  are  encoded,  which  can  provide  addi¬ 
tional  evidence  about  some  hypothesis  of  interest,  or  new  information 
changes  the  expert's  beliefs  about  the  conditioning  elleils  between  the  projio- 
sitions  in  the  knowledge  system 


Using  tliesc  assumptions  as  a  basis,  tlic  research  questions  are  directed  at  the 
issues  concerning  the  dilliculty  of  obtaining  probabilities  and  the  coinjilcxity  in  the 
database  modification  process.  By  reducing  the  quantity  of  information  needed,  a 


reduction  in  the  effort  required  in  modifying  tlie  knowledge  base  may  be  achieved. 
Sutmtiary 

'I'lie  major  difficulties  of  using  probabilities  in  a  KBS,  as  described  in  tliis  chapter, 
motivated  this  researcti  effort.  Because  the  assessment  of  probabilities  Ls  a  difficult  and 
time-consuming  process,  it  is  desirable  to  reduce  the  number  of  assessments  which  are 
required  during  the  maintenance  of  probabilistic  knowledge-based  systems.  To  define 
the  framework  for  this  discussion,  C'hajitcr  II  reviews  the  current  research  dealing  with 
the  use  of  probability  and  directed  graphs  (such  as  influence  diagrams)  to  update  beliefs 
in  knowledge-based  systems  Chapter  III  presents  special  cases  under  which  fewer  pro¬ 
bability  asse.ssments  may  be  required  during  the  maintenance  effort,  while  Chapter  IV 
gives  the  conclusions  reached  during  this  thesis  effort,  as  well  as  possible  areas  for 


further  researcti. 


•.^-  H.  V 


M*V.*'*.^W'l»V.*  v»v"  ^.■y.'nt"  V.  V.»t":^’.V-V.^".V 


II.  Current  Status  of  Frobabthstic  Knowledge-Based  Systems 

Before  examining  the  maintenance  of  probabilistic  knowledge-based  systems,  ii  is 
important  to  understand  the  process  used  to  propagate  the  effects  of  evidence 
throughout  such  systems.  Much  of  the  current  research  in  the  use  of  probabilities  to 
represent  uncertainty  in  knowledge-based  systems  centers  on  two  grai)hicai  representa¬ 
tions:  belief  networks  (also  called  Bayesian  networks)  and  influence  diagrams  (1;  2;  7; 
10)  Th  is  chapter  defines  terms  to  assist  in  the  discussion  of  these  representations,  and 
presents  Sliachter’s  algorithm,  a  flexible  method  for  conducting  probabilistic  inference. 

•  Influence  Diagrams  and  Belief  Networks 

Both  influence  diagrams  and  belief  networks  contain  information  at  two  levels. 
On  the  most  visible  level,  the  network  graph  consists  of  nodes  corresponding  to  each 
variable  (or  proposition)  in  the  system,  and  arcs  indicating  dependencies  between  those 
nodes  On  a  lower  level,  more  detailed  information,  such  as  probability  distributions 
and  a  set  of  possible  outcomes  (possible  values  which  the  variable  may  take),  are  associ¬ 
ated  with  each  node.  As  is  common  in  the  literature  (10:3),  no  distinction  between  a 
node  and  its  associated  variable  is  made  throughout  the  remainder  of  this  thesis 

While  both  influence  diagrams  and  belief  networks  contain  probabilistic  (or 
chaiKc)  nodes,  the  influence  diagram  may  also  contain  decision  nodes,  which  maximize 
the  expected  value  of  their  predecessors,  and  a  value  node,  which  represents  a  deter¬ 
ministic  function  of  its  predecessors  (<ither  chance  or  decision  nodes).  Influence 
diagrams  form  a  super-set  of  belief  networks.  In  fact,  an  influence  diagram  which  con¬ 
tains  no  value  or  decision  nodes  (called  a  probabilistic  influence  diagram)  is  eipiivalent 
to  a  belief  network  (10  3) 
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Some  pro{)agation  schemes  for  belief  networks  are  limited  to  singly  connected  net¬ 
works,  where  there  is,  at  most,  one  undirected  path  between  any  two  nodes  (7:219). 
This  limitation  enables  the  use  of  local  propagation,  with  its  gains  in  efficiency  and 
applicability  to  parallel  processing,  for  a  possible  real-tini"  inference  capability  (7:269). 
This  capability  is  highly  desirable,  but  many  real-world  problems  can  not  obviously  be 
represented  by  a  singly  connected  network.  One  of  the  most  promising  approaches  to 
overcome  this  limitation  is  the  addition  of  auxiliary  variables  to  convert  a  multiply  con¬ 
nected  network  into  a  singly  connected  one  (7:269-270).  Unfortunately,  it  is  not  clear 
how  these  auxiliary  variables  should  be  added,  what  meaning  these  variables  would 
have,  or  even  if  adding  such  nodes  is  always  feasible  (3:5).  Another  approach  entails 
coll:i|).smg  the  multiply  connected  portion  of  the  graph  into  a  single  variable,  with  out¬ 
comes  corres{)onding  to  combinations  of  the  collapsed  variables’  outcomes.  However, 
this  method  is  liable  to  exponential  complexity  in  the  number  of  collapsed  variables. 

l  or  the  remainder  of  this  thesis,  the  discussion  will  center  on  the  use  of  influence 
diagrams  (vice  belief  networks)  in  know  ledge- based  systems.  The  reason  for  this  focus 
IS  simple:  influence  diagrams  may  be  used  to  represent  any  belief  network,  while  only 
the  probabilistic  portion  of  an  influence  diagram  can  be  represented  in  a  belief  network. 
The  influence  diagram  propagation  scheme  can  be  used  to  solve  belief  diagrams  for 
updated  probabilities,  so  results  which  apply  to  the  probabilistic  portion  of  influence 
diagrams  are  equally  applicable  to  belief  networks.  Also,  most  of  the  literature  concern¬ 
ing  belief  networks  is  relevant  to  the  use  of  influence  diagrams. 


Dejinitions  and  Notation 


licfore  further  exainiriation  of  influence  diagrams  and  their  use  in  knowledge-based 
systems,  a  few  helpful  definitions  and  some  notation  for  discussing  influence  diagrams  is 
necessary.  A  more  in-depth  discussion  can  be  found  in  Shachter  (10:3-7). 

The  Metastatic  Cancer  Influence  Diaijram.  The  metastatic  cancer  influence  diagram 
(depiclcd  in  Figure  1)  is  used  to  illustrate  the  definitions  presented  in  this  section,  and 
to  demonstrate  other  concepts  throughout  the  remainder  of  this  thesis. 


Figure  1.  Influence  Diagram  for  Metastatic  Cancer  Model 

Tliis  model,  also  used  by  Pearl  (8:218),  is  constructed  in  a  causal  direction.  Based  on 
the  expert’s  current  state  of  information  (&),  the  presence  of  metastatic  cancer 
(MC^  me)  in  a  patient  can  possibly  cause  increased  serum  calcium  (lC=ic),  a  brain 
tumor  (B'r  =  bt),  or  both.  Either  of  these  two  can  cause  the  patient  to  lapse  into  a 
coma  (C=c).  In  addition,  a  brain  tumor  can  be  the  cause  of  severe  headaches  (H  -h) 
for  the  patient.  The  associated  marginal  and  conditional  probabilities  are  given  in 


Fable  2 


'I'able  2.  Probabilities  for  Metastatic  Cancer  Model 
P(MC  U  ):  P(tnc  I  -=  0  20 

I’(IC  (  MC.i,’):  F’(ie  I  rnc.A  )  -  0.80  F’(ic  |-imr,it')  -  0  20 

P(nT  |MC,A);  P(bt  |rnc..t)  -  0.20  P(bt  )  =  0  0-^ 

P(C  I  IC,BT,<t):  l’(c  I  ie,  bt,.V)  =-  0.80  P(c  |  ir,-.bt,A  )  --  0  SO 

F‘(c  Hie,  bt,\)  0.80  F’(c  |-t>c.“'bt,Lt)  -  0.05 

I’(I!  P(h  I  bl.A  )  0.80  I’(li  hbt.i)  =  O.GO 

'F'liis  cau.sal  representation  is  just  one  of  tlie  5!  logically  ecpiivalent  influence 
diagram  representations  for  this  problem  (1:723).  Fi^ach  of  these  5!  representations 
result  in  the  same  joint  distribution. 

Drjintttons  for  Influence  Diagrams.  As  indicated  in  Table  2,  the  jirobability  distri¬ 
bution  for  the  outcomes  of  a  node  (variable)  is,  in  many  cases,  a  conditional  probability 
distribution.  'F'he  conditioning  for  a  node.  1,  is  shown  in  the  influence  diagram  by  arcs 
into  I  from  each  of  the  conditional  predecessors.  The  notation  C(i)  is  used  to  represent 
the  set  of  conditional  predecessors  of  node  1;  that  is,  the  set  of  all  nodes  with  an  arc 
going  directly  into  node  1.  In  the  metastatic  cancer  example,  the  conditional  predeces¬ 
sors  for  tfie  coma  node  (C)  are  tfie  increased  serum  calcium  (1C)  and  brain  tumor  (BT) 
nodes  If  C(I)  is  the  empty  set,  then  a  marginal  distribution  is  indicated,  since  I  has  no 
incoming  arcs.  'I'fie  metastatic  cancer  node  (MC)  is  an  example  of  a  node  with  a  margi¬ 
nal  (fist  ribution 

'I  he  set  of  weak  predecessors  t>f  a  node  1,  denoted  liy  \V(1),  is  defined  as  all  nodes, 
.1,  f(>r  which  tliere  is  a  directed  patfi  from  J  to  I.  C(l)  is  a  subset  of  W(I).  F-'or  tlie 
meta.slalic  cancer  influence  diagram  headai  he  node  (II),  C(B)  is  the  set  {BI  },  and  \\(ll) 
IS  (B’l’,  MC},  Similarly,  I)(I),  the  direct  successors  (if  node  I,  is  the  set  (if  nodes  for 
which  I  is  a  conditional  predecessor  F'or  examph',  1)(.\1(')  is  the  set  JK',  BT) 


'I'lir  final  conrej)t  that  needs  to  be  introduced  is  that  of  node  ordering  A  list  of 
nodes  is  ordered  if  “none  of  the  weak  predecessors  of  a  node  follow'  the  node  in  the  list'' 
(10:6).  Any  list  of  nodes  can  be  ordered  by  placing  a  node  with  no  conditional  i)rede- 
cessors  on  the  ordered  list,  then  adding  other  nodes,  one  at  a  time,  whose  conditional 
[jredcces.s'irs,  if  any,  are  already  on  the  ordered  list  (10:6-7).  There  may  be  more  than 
one  ortli  red  list  for  any  given  influence  diagram. 

Adt  antdgrs  of  Influence  Diagrams 

lleckerman  and  llorvitz  (2)  discuss  the  advantages  of  belief  networks  over  rule- 
based  representations.  In  particular,  they  show  that  such  graphical  representations  are 
more  natural  and  efficient  in  representing  dependencies  within  the  problem  domain. 
There  is,  as  they  point  out,  an  increased  cost  for  representing  more  complex  dependen¬ 
cies  in  any  knowledge-based  system:  more  probabilities  must  be  encoded,  and  computa¬ 
tional  ctjsts  are  greater,  I'hey  claim,  however,  that  rule-based  rejircsentations  are  at 
least  as  costly,  and  usually  more  costly,  than  belief  networks  (and  hence  inlluence 
diagrams)  in  representing  these  dependencies  (2  125). 

.Another  advantage  of  influence  diagrams  is  the  ability  to  represent  domain 
knowledge  in  a  causal  direction  Mc>re  recent  literature  (11;  7)  indicates  that  [)ec'[)le  find 
It  easier  to  describe  influences  in  a  causal  direction,  that  is,  based  on  hypotheses  which 
cause  evidences  This  is  directly  opposite  of  the  direction  this  information  is  used  in 
many  rule-based  systems,  which  rerpiire  the  probability  of  the  hype. thesis  given  the  evi¬ 
dences  111  55)  Recall  the  metaslatii'  cancer  inlluence  diagram,  which  w  .as  constructed 
in  the  causal  direction  'I'lie  influence  diagram  depicted  in  f  igure  2,  which  follows  the 
evidential  direction  re<|uired  for  a  rule-based  system,  is  logically  equivalent  tc>  the  causal 


graph  show  n  in  f  igii  re  1 


Figure  2.  l.D.  for  Metastatic  Cancer  Model  (Evidential  Direction) 

A  typical  rule  base  for  this  model,  but  one  which  does  not  capture  the  dependencies 
indicated  in  the  influence  diagram  in  Figure  2,  might  resemble  the  following: 


JF  JC  -inrreased.calcium  AND  B'l'=brain_(  umor 

THEN  MC=metastatic_cancer  (.^) 
IF  IC  — increased_calcium  THEN  M(^— meta.sta(ic_cancer  (.^) 

II'  HT=brain_tumor  'FHEN  MC  -metastatic_canccr  (.^) 

IF  C  -coma  AND  H  severe_hoa(iache.s  THEN  BT=brain_lunior  (.if) 

IF  (’—coma 'I'll I'iN  lC-=increased_calcium  (.ff) 

IF  ('—coma  'FHEN  BT  — brain_lumor  (.#) 

II'  11  sovere_headaches  'FHEN  B’i'— •brain_tumor  (.#) 


'Fhese  rules,  and  the  as.sociated  degrees  of  belief  (.if),  would  be  actjuired  as  jiart  of  the 
knowledge  engineering  process  Since  all  the  dependencie.s  are  not  represented  by  this 
rule- based  representation,  an  ad  hoc  method  for  combining  degrees  of  belief  (i.e. 
,\n’Cd,N"s  certainty  factor  methrxl)  may  be  used,  aiul  the  rules  may  need  to  be  revised 
to  achieve  better  results. 

In  order  tc'  accurately  reiireseni  thi’  dependencies  shown  in  Figure  2,  a  larger  rule 
base  is  reriuired 


IF  IC=increased_calcium  AND  B r=bram_tunior 

THEN  MC=inctasta(ic_cancer  (.##) 
II'  IC=increasccl_cakium  AND  B’r=not_brain_tuiiior 

THEN  MC=metastatic_cancer  (  #75^) 
IF  IC=not_increased_calcium  AND  BT=brain_tumor 

THEN  MC=metastatic_cancer  (•##) 
IF  IC=not_increased_calciurn  AND  BT=no(_brain_tuinor 

THEN  MC=metastatic_cancer  (.##) 

IF  C=coina  AND  BT=brain_tuiiior 

THEN  IC=increased_serum_calciurn  (.##) 
IF  C=conia  AND  BT=not_brain_tumor 

THEN  IC=increased_serum_calcium  (.##) 
IF  C=not_corna  AND  BT=brain_tumor 

THEN  IC=increased_serum_calcium  (.##) 
IF  C— not_coma  AND  BT=not_brain_tumor 

THEN  lC=increased_jserum_calcium  (.##) 
IF  C=conia  AND  H=severe_headaclies  THEN  BT=brain_tumor  (  ##) 

IF  C=coma  AND  H=not_severc_headaches  THEN  BT=brain_tuinor  (  ##) 

IF  C=not._coma  AND  H=severc_headaches  THEN  BT=brain_tumor  (.#^) 

IF  C=not_coma  AND  H=not_severe_headaches 

THEN  BT=brain_tuinor  (.##) 

IF  C=conia  THEN  H— severe_headaches  (.##) 

IF  C=not_coma  THEN  H==severe_headaches  {■##) 


I'heso  fourteen  rules  represent  the  dependencies  shown  in  the  influence  diagram,  and  the 
degree  of  belief  (•##)  for  each  rule  would  be  acquired  as  part  of  the  knowledge 
engineering  process.  This  compares  with  the  required  assessment  of  only  eleven  proba¬ 
bilities  for  the  influence  diagram  representation,  as  shown  in  Table  2.  More  complex 
dependencies  may  cause  the  rule-based  method  to  degenerate  to  a  very  basic  look-up 
table,  where  no  chaining  would  be  possible  since  each  combination  of  all  evidences  and 
hypotheses  is  represented  by  a  corresponding  rule  (2:123).  Thus,  only  one  rule  could 
possibly  match. 

One  of  the  most  important,  and  perhaps  the  most  overlooked,  advantages  of  the 
influence  diagram  is  referred  to  by  Heckertiian  and  Horvitz  (2.125)  as  a  local  notion  of 
modularity.  In  actuality,  this  is  just  the  conditional  independence  shown  by  the  lack  of 
an  arc  from  one  node  into  another  in  an  influence  diagram.  Conditional  independence  of 


two  nodes  exists  when,  given  the  values  of  the  conditional  predecessors  of  one  node,  the 
probability  distribution  of  that  node  is  independent  of  the  value  of  the  other  node.  As 
a  consequence  of  the  conditional  independence  represented  in  an  influence  diagram,  the 
probability  distribution  of  any  node,  given  the  values  of  its  conditional  predecessors,  is 
independent  of  the  other  weak  predecessors  of  that  node.  This  can  be  written  as 

r{i=^t\w{iu\  =  r\i=t  |c(/),&]  (1) 

As  Shachter  (10:7),  Heckerman,  and  Horvitz  (2:125)  state,  the  marginal  and  conditional 
probabilities  contained  in  an  influence  diagram  contain  all  the  information  necessary  to 
construct  the  full  joint  distribution.  If  an  influence  diagram  consists  of  the  chance  nodes 
/j.  In.  ...  ,  and  1„,  then  the  joint  distribution  is  simply  the  product  of  all  the  marginal 
and  conditional  distributions  in  the  graph: 

/’l/t=«l,/2=«2.  •  •  .  ,/m=U  Ui  =  lVl/;=«;  |C(/;),&1  (2) 

/-I 

As  Heckerman  and  Horvitz  indicate,  this  property  can  greatly  reduce  the  data  collection 
associated  with  modifying  the  influence  diagram.  There  is  no  need  to  reassess  all  of  the 
probabilities  in  an  influence  diagram  when  a  change  occurs,  since  only  the  direct  prede¬ 
cessors  of  a  node  influence  its  probability  distribution.  Only  nodes  which  have  their 
incoming  arc(s)  modified  need  to  have  their  distributions  reassessed.  This  particular 
property  is  critical  to  the  development  of  the  special  cases  in  Chapter  III. 

d'o  sec  the  value  of  this  property  more  clearly,  consider  the  following  change  to  the 
metastatic  cancer  example.  The  experts  have  determined  that  the  severity  of  a 
patient's  headaches  may  be  caused  by  increased  serum  calcium  as  well  as  a  brain  tumor, 
so  an  arc  is  added  from  the  increased  scrum  calcium  node  (1C)  to  the  severe  headache 
node  (11).  'I'he  new  joint  distribution  can  be  obtained  either  by  1)  assessing  the  joint 
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distribution  directly;  2)  assessing  all  of  the  conditional  and  marginal  distributions  for 
the  new  influence  diagram;  or  3)  assessing  only  the  new  distribution  for  11.  To  reassess 
the  existing  probabilities  in  the  joint  distribution  directly  would  require  2^—1  probabil¬ 
ity  assessments.  Only  thirteen  assessments  are  needed  to  define  the  marginal  and  con¬ 
ditional  distributions  for  the  new  influence  diagram,  so  some  savings  are  realized  by 
choosing  to  represent  the  joint  as  the  product  of  marginal  and  conditional  distributions. 
Even  more  savings  are  obtained  because  of  local  modularity,  as  only  four  assessments 
are  required  to  define  H’s  new  conditional  distribution. 


m-Aji 
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Propagation  Using  Influence  Diagrams 

One  of  the  main  attractions  of  influence  diagrams  lies  in  the  flexibility  provided 
through  the  use  of  a  few  simple  operations.  Nowhere  is  this  flexibility  more  evident 
than  in  Shachter's  algorithm  for  solving  any  well-formed  influence  diagram  (one  con¬ 
taining  no  directed  cycles).  His  algorithm  either  provides  the  solution  of  a  general  pro¬ 
babilistic  inference  problem  (i.e.,  the  conditional  probability  distribution  of  a  set  of 
hypotheses,  given  some  set  of  evidences),  or  determines  what  data  are  needed  if  some  of 
the  probabilities  or  outcomes  for  the  influence  diagram  have  not  been  specified  (10:8). 

Before  examining  this  algorithm,  a  short  overview  of  some  of  the  operations  for 
maniiiulal iiig  influence  diagrams  is  in  order.  Only  the  three  primative  operations 
required  in  order  to  perform  probabilistic  inference  using  influence  diagrams  (10:10-16) 
will  be  covered  here;  readers  are  referred  to  (10)  and  (4)  for  other  operations.  None  of 
these  three  operations  change  the  basic,  underlying  (conditional)  probability  distribu¬ 
tion  for  the  hypotheses  given  the  evidences  (10:10). 

Barren  Node  Elimination.  A  node  which  has  no  successors  and  is  neither  a 
hy|)otliesis  nor  an  evidence  is  called  a  barren  node.  As  Shachter  says, 
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Since  we  are  not  trying  to  estimate  (its  prohahility  distribution),  we  do  not  observe  its 
value,  and  no  other  vari.ables  are  conditioned  by  it,  it  is  irrelevant  to  the  inference  prob¬ 
lem  we  are  solving.  Therefore,  we  could  eliminate  the  barren  node  without  affecting 
the  solution.  jl0:10j 

Two  important  clarifications  should  be  made.  First,  no  node  is  inherently  barren  in  an 
influence  diagram.  The  definition  of  being  barren  is  strictly  related  to  a  given  inference 
problem.  Take,  for  instance,  the  metastatic  cancer  example  presented  earlier.  If  the 
hypotliesis  is  “Metastatic  cancer  is  present,”  or,  etjuivalently,  that  MC'=mc,  and  the 
observable  evidence  is  the  presence  or  absence  of  severe  headaches,  then  the  coma  node 
(C)  is  barren,  as  is  the  increased  serum  calcium  node  (K’)  once  C  is  removed.  On  the 
other  liaiid,  if  the  observable  evidence  is  the  patient  s  being  in  a  comatose  state  or  not, 
the  only  barren  node  is  the  severe  headache  node  (H).  If  both  C  and  II  are  observable, 
as  would  normally  be  the  c.ase,  no  node  is  barren  without  transforming  the  diagram 
further. 

'Fhe  other  clarification  relates  to  the  state  of  information.  While  it  could  be 
argued  that  the  state  of  information  upon  which  the  influence  diagram  is  based  has 
changed  once  a  barren  node  is  removed,  this  can  be  viewed  more  as  a  coarsening  of 
data.  I’liat  is,  the  probabilities  of  interest  have  not  changed,  but  some  information  has 
been  lost  from  the  influence  diagram  It  would  be  quite  impossible,  given  only  what  is 
contained  in  the  reduced  influence  diagram,  to  reconstruct  the  original  nodes,  dependen¬ 
cies,  and  probabilities  that  were  removed. 

.4rc  Reversal.  Perhaps  the  most  important  operation  in  the  solution  of  a  proba¬ 
bilistic  inference  problem  is  arc  reversal.  Arc  reversals  are  used  for  two  primary  pur¬ 
poses  in  Shachtcr’s  algorithm.  First,  arc  reversals  are  used  to  Iraiisforiii  the  influence 
diagram,  making  nodes  which  are  neither  hypoflie,ses  nor  evidences  barren,  .so  they  can 
be  removed.  'I'lie  second  jiiirfiose  for  ti.'-ing  arc  reversal  is  to  obtain  the  conditional 


distributions  of  interest.  The  only  requirement  for  reversing  an  arc  is  that  there  be  no 
other  directed  path  between  the  nodes  at  either  end  of  the  arc. 


When  an  arc  from  node  /  to  node  J  is  reversed,  each  node  inherits  the  conditional 
predecessors  of  the  other.  'I’hen,  through  summing  and  application  of  Bayes’  law,  the 
conditioning  between  /  and  J  is  reversed  (10:13); 

=  E  r\j^j\c,,,(j),s.]p\j=i\c,,Aim  ... 

eutcome$ 

of  / 


p\I=t  I (?„,,,(/), &i  = 


P\J=j\c„,AJU'\P\i=t\c,,AnS'] 
P{J=J  \Cne.(Jm 


where  is  the  combined  conditional  predecessors  of  I  and  J,  excluding  the  node 

/,  and  is  the  new  conditional  predecessors  of  J  and  the  node  J  itself. 

Detcrminislic  Node  Propagation.  To  gain  computational  efficiencies,  Shachter  uses 
the  notion  of  a  deterministic  node,  which  is  a  chance  node  with  a  degenerate  probability 
distribution  (10:4).  Shachter  proves  that,  for  any  influence  diagram  which  is  fully 
specified  (has  complete  graphical,  outcome,  and  probability  distribution  information) 
and  contains  an  arc  from  a  probabilistic  node,  /,  to  another  node,  J,  it  is  possible  to 
remove  the  conditioning  of  /  relative  to  J  (10:11).  This  is  done  simply  by  adding  arcs 
from  the  conditional  predecessors  of  /  to  node  J ,  then  removing  the  arc  from  I  to  J . 
This  is  possible  since  the  outcomes  of  /  are  determined  exactly  by  the  outcomes  of  /'s 
conditional  predecessors.  Additionally,  if  J  is  deterministic,  it  will  remain  determinis¬ 
tic. 


Node  Reduction.  Removing  nodes  from  the  influence  diagram  without  changing  the 
underlying  probability  distribution  for  the  hypotheses  given  the  evidences,  called  node 
reduction,  is  a  combination  of  the  three  simple  operations  above  (10  1  1).  Reducing  a 


dcleriiiiiiistic  node  is  extremely  simple,  merely  propagate  the  deterministic  node  into 
each  of  its  successors,  then  remove  the  fnow  barren)  deterministic  node. 

Reducing  a  probabilistic  node  is  only  slightly  more  complicated.  First,  an  ordered 
list  of  the  direct  successors  of  the  node  must  be  obtained.  The  goal  node  must  be 
placed  last  in  the  ordered  ILst  if  it  is  a  direct  successor  of  the  node  being  reduced.  Arcs 
arc  reversed  in  order,  then  the  (now  barren)  node  is  removed. 

Shachter’s  Algorithm.  Although  Shachtcr’s  algorithm  may  be  used  to  solve  any 
general  probabilistic  inference  problem  given  a  fully  specified  influence  diagram,  and  is 
the  most  flexible  of  the  available  probabilistic  inferencing  schemes,  it  is  very  simple  to 
execute.  'I'o  find  the  solution  to  any  such  inference  problem,  even  those  with  many 
variables  of  interest  (i.e.,  multiple  hypotheses)  a  new  deterministic  variable,  conditioned 
on  the  variable(s)  of  interest,  is  added  to  the  diagram  as  the  goal  node.  The  goal  node 
has  no  successors,  and  will  gain  none  throughout  the  solution  process.  After  adding  the 
goal  node,  all  that  remains  is  the  reduction  of  all  non-evidence  nodes  from  the  diagram 
(10:15)  Although  no  particular  order  is  required  for  the  solution  of  the  inference  prob¬ 
lem,  always  reducing  barren  nodes  and  conditional  predecessors  of  the  goal  node  allows 
the  solution  to  be  obtained  with  the  smallest  possible  amount  of  information  specified 
in  the  diagram  (10:16)  This  would  be  highly  desirable  in  situations  where  all  of  the 
prolr.iliilistic  information  was  not  known.  A  probabilistic  K13S  could  then  use 
Shachler's  algorithm  to  request  the  minimum  amount  of  information  necessary  to  solve 
the  specified  probabilistic  inference  problem. 

Vse  of  Inflnenre  Diagrams  in  KliS.  lip  to  now,  this  chapter  has  covered  the 
mechanics  of  performing  probabilistic  inference  in  a  general  sense.  'I’he  construction  of 
the  influence  diagram  from  probabilistic  nodes  and  arcs,  desirable  projierties  associated 
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with  mfluciue  diagrams,  the  transformations  ri-quircd  for  probabilistic  infcrcncing,  and 
a  simple,  powerful  algorilfim  for  the  solution  of  probabilistic  inference  problems  have 
all  been  discussed.  I3ut  just  how  can  all  of  this  be  used  in  k now ledgi*- based  systems? 

In  many  KUS  in  use  today,  actions  are  taken  (or  recommended)  based  on  a  degree 
of  belief  111  a  proposition,  or  a  set  of  propositions  'I'hat  degree  of  belief  is  altered 
through  knowing  the  outcomes  of  other,  observable  propiositions  Using  an  influence 
diagram,  the  entire  conditional  distribution  for  the  hypotheses,  given  the  evidences,  can 
be  computed  in  a  simple  manner.  This  distribution  can  then  be  used  as  the  basis  for 
the  KF3S’  recommended  action,  or  further  normative  analysis. 

Summary 

'I'liis  chapter  laid  the  basic  foundation  for  understanding  the  use  of  influence 
diagrams  in  reasoning  with  uncertainty  in  know  ledge- based  systems.  Influence 
diagrrtm.s  (irovidc  an  c/ficient  means  to  rejire.scnt  dependencies  among  variables,  and 
contain  all  the  information  needed  to  construct  the  joint  distribution.  Shacliter's  algo¬ 
rithm  provides  a  simple,  flexible  way  to  perform  probabilistic  inferencing  with  influence 
diagrams.  When  an  influence  diagram  must  be  modified,  the  local  modularity  property 
reduces  the  number  of  probabilities  that  must  be  assessed.  The  next  chajifer  examines 
situations  where  further  reductions  may  b*-  possible. 
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Ill  {{esearrh  Methodology  and  Special  Case  Development 


Just  as  with  rulo-ha-so  systems,  ilir  general  domain  knowledge  of  probabilistic 
knowledge-based  systems  is  not  static.  'I'he  underlying  (domain-specific)  knowledge  in 
the  KIJS  must  be  changed  to  reflect  the  changed  stale  of  information  when  new  tests 
for  existing  hypotheses  are  developed,  new  hypotheses  are  formed,  or  a  more  thorough 
understanding  of  the  problem  domain  is  gained.  Heckerman  and  Horvitz  (2;125-12G) 
briefly  discuss  the  process  of  adding  a  node  to  a  Bayesian  belief  network,  but  their  dis- 
cussiiin  holds  etpially  well  for  influence  diagrams.  When  determined  (by  whatever 
means)  that  the  model  represented  in  the  influence  diagram  is  no  longer  adequate,  the 
first  task  for  the  knowledge  engineer  and  the  expert  is  the  reasse.ssinenl  of  the  nodes 
and  their  dependencies  represented  in  the  influence  diagram.  Nodes  may  be  added  or 
deleted,  outcome  spaces  for  individual  variables  may  increase  or  decrease,  arcs  may  be 
a<ided  or  deleted,  or  the  probability  distributions  for  a  variable’s  outcomes  may  be 
changed,  any  combination  of  which  indicates  a  change  in  the  state  of  information  upon 
which  tfie  graph  is  based.  Because  of  the  local  modularity  property,  after  the  expert 
reas-sesses  the  basic  underlying  dependency  structure,  the  only  [irobability  distributions 
that  must  be  re-encoded  are  those  associated  with  nodes  that  have  had  some  change 
made  t(^  their  outcome  space  (gaming,  losing,  or  changing  outcomes)  or  incoming  arcs 
(gaining  or  losing  an  incoming  arc,  or  liaMiig  the  outcome  space  of  a  conditioning  vari¬ 
able  modified )  (2: 1 25). 


In  systems  where  the  underlying  dependency  structure  changes  infrequently,  and 
there  is  a  reipiiremeiit  for  real-time  (iroiiagath.ni  of  the  effects  of  evidences,  the  time  and 
ellort  required  to  encode  the  new  distributions  mav  be  relativelv  insignificant.  Ik'wever, 


as  lliesp  systems  are  a|)plicd  to  problem  dumaiiis  which  are  highly  dynamic,  meaning 
the  underlying  dependencies  and  probabdit les  often  change,  a  significantly  larger  por¬ 
tion  of  time  will  be  spent  encoding  probalulities.  An  example  of  such  a  dynamic  system 
might  be  a  KMS  which  interprets  intelligence  data  and  attempts  to  identify  specific 
enemy  tactics.  As  the  enemy  develops  new  tactics,  and  analysts  identify  discriminators 
for  indicating  when  these  tactics  are  being  used,  new  nodes  and  dependencies  are  added 
to  the  KMS.  'J'liis  chapter  examines  ways  in  whicli  the  underlying  jirobabilistic  informa¬ 
tion  can  change  and  possible  means  to  reduce  the  level  of  effort  reejuired  in  the  encoding 
process,  relative  to  these  changes. 

H'/ial  Happens  When  the  State  of  Information  Changes? 

As  indicated  earlier,  when  the  underlying  state  of  information  changes,  the  depen¬ 
dency  structure  for  the  influence  diagram  must  be  reassessed,  and  tlnse  nodes  which 
experience  a  change  in  tfieir  incoming  arcs  or  outcome  space  must  be  reassessed.  The 
more  nodes  that  experience  such  changes,  the  more  information  that  must  be  encoded 
by  the  expert.  At  the  very  least,  probabilistic  information  must  be  encoded  for  the  new 
outcomes  and  variables.  Also,  any  data  invalidated  by  the  change  in  the  state  of  infor- 
mati"n  must  be  reassessed,  even  if  the  dependency  structure  did  not  change.  However, 
all  IS  not  necessarily  lost.  'I’here  may  be  some  circumstances  under  which  all,  or  nearly 
all.  of  the  original  probabilistic  information  is  still  valid  under  the  new  state  of  informa¬ 
tion.  Stmie  of  these  circumstances  are  identified  in  the  following  sections  as  special 
cases  wfiich  may  apply  for  some  state  of  information  changes  for  an  influence  diagram. 


Takitig  the  first  Step 


Since  the  joint  distribution  is  fully  represented  by  the  information  in  an  iiilluence 
diagram,  the  initial  focus  of  the  research  was  on  keeping  the  joint  distribution  from 
changing  a  great  deal  Specifically,  special  cases  were  developed  for:  1)  changing  the 
number  of  outcomes  for  individual  nodes:  and  2)  changing  the  number  of  variables  in 
the  joint  distribution.  As  these  special  cases  were  being  developed,  it  became  increas¬ 
ingly  clear  that  this  focus  was  not  the  best  for  purposes  of  this  research  effort.  Kxamin- 
ing  the  sjiecial  cases  for  the  joint  disinijution  revealed  that  more  than  one  type  of 
change  in  the  marginal  and  conditional  piobabilities  could  bring  about  the  same  change 
to  the  jc'iiit  (list riljution  'I'liis  multipliiily  of  causes  obscured  the  conditions  which 
might  lead  to  the  special  case's  being  ai>plicable.  A  more  fundamental  view,  based  on 
how  the  [irobabilistic  information  would  be  gathered  in  a  probabilistic  KHS,  was 
adopted 

Spcrtdl  Cases  for  Marginal  and  Conditional  Probability  Distributions 

As  indicated  by  I’earl,  Shachter,  and  others,  information  from  experts  is  more 
easily  gathered  in  the  form  of  marginal  and  conditional  distributions  (3:5;  7:21();  11:55) 
Since  information  is  primarily  collected  in  this  manner,  it  makes  much  more  sense  to 
examine  [>ossible  effort-saving  special  c.asivs  from  this  persjiective  'I'he  primary  objec¬ 
tive  IS  to  keep  as  many  of  the  original  j>robat)ilities  as  possible  relevant  under  the  new 
state  of  information 

.'special  cases  tiased  on  the  marginal  and  conditional  distributions  can  tie  readily 
groufied  into  tlir'se  ap[)licable  when  1)  the  outcome  sjiace  for  a  variable  changes  in  size; 
2)  a  varialile  is  added  or  removed  from  the  influence  diagram,  and  3|  an  are  between 
two  nodes  IS  added  or  removed,  changing  the  c(>ndit  ion  mg  information  in  the  diagram 


'I'lir  only  oilier  change  which  indicates  a  new  slate  of  information  is  when  underlying 
jirobahililies  change  No  special  cases  were  found  to  reduce  ihe  number  of  a.ssessments 
recpiired  in  response  to  this  type  of  change 

For  each  sfiecial  case,  we  examine,  separately,  the  effects  on  the  node  being 
changed  (either  a  node  experiencing  a  change  in  its  outcome  space,  or  a  new,  added 
node)  and  on  luxies  whose  incoming  arcs  are  somehow  modified  (either  by  a  change  in 
the  outcome  space  of  a  conditional  predecessor,  or  by  the  addition  or  loss  of  conditional 
predecessors).  Application  of  a  special  case  to  one  of  these  nodes  neither  ensures  nor 
prohibits  its  application  to  another  node.  For  each  special  case,  tlie  effects  on  the 
changed  node  and  on  its  successors  will  be  discussed.  Since  exponential  growth  can 
occur  when  changes  are  made  to  the  influence  diagram,  these  special  cases  were 
developed  primarily  with  an  expansion  of  the  outcome  space  or  number  of  variables  in 
mind 

(Viunyc.s  in  the  Outcome  Space.  When  the  change  to  a  new  state  of  information 
results  in  a  change  in  a  node's  outcome  space,  the  probability  distribution  for  that 
node  must  be  reassessed.  I  he  distribution  of  any  other  nodes  which  were  previously,  or 
are  now,  ccmdilioned  on  the  changed  node  must  also  be  reassessed.  Two  special  rases, 
the  “ignored  outcome”  and  the  “split  outcome”,  may  reduce  the  number  of  asse.ssments 
re(|Uired 

hjnored  Outcome  Special  Case.  Interest  in  the  first  special  case  was  motivated 
by  the  following  cjuestion:  if  a  new,  ar  previously  “forgotten ’’  outcome  was  added  to  a 
node  (indicating  a  new  state  of  information),  under  what  conditions  would  the  original 
probabilistic  information  be  of  use,  and  just  how  could  it  be  used'i’  f  or  this  case,  the 
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original  oulconie  space  for  the  changed  '  ariable  would  be  mutually  exclusive  but  not 
collectively  exhaustive. 

A  simple  example  clarilies  the  discussion  of  this  case.  Let  A  be  a  node  with  m  out¬ 
comes  under  the  original  state  of  informal  ion,  A'.  Now  the  expert  perceives  a  previously 
igncired  outcome,  fln,4i  "i  his  new  knowledge  (that  outcome  exists)  indicates  a 

change  m  the  state  of  information,  and  t  le  diagram  must  be  reassessed  relative  to  this 
new  state  of  information,  Sc'.  If  the  expeii  determines  that  the  old  probability  distribu¬ 
tion  for  .1  ,  given  the  conditional  predecessors  C(A  ),  is 

U’(A  ),A-|  =  /  (A  =a.  |  ('(A  ),A  'j  (4) 

then  the  new  probabilities  for  the  original  outcomes  of  A  are  given  by 

/’  ;A  I  r,  (A  ),Sc  ')  =  X,  /’  |A  |  C',(A  ),k\  i  =  1 - m  , 

II  ILvIl 

X  e  c\A) 

where  |[.V  |1  denotes  the  number  of  outcomes  for  variable  A'.  \j  is  just  a  scaling  factor 
for  the  probability  distribution  of  A  giv.-n  the  old  stale  of  information  and  a  specific 
combination,  indexed  by  the  subscrijit  j,  A  the  outcomes  for  variables  in  C(.\).  and  is 
given  by 

\  =  1-/’|A=«„.^,  Uy.lj.A  ';  (7) 

I  II  see  more  clearly  the  use  of  X^ ,  h  i  A  (with  outcomes  Oj  and  n.j)  have  one  con- 
dition:il  predecessor.  A’  (with  outcomes  j,  and  r2)-  eonditional  probabilities  of 

f’ [A  =U|  |A’=T|,A;  =  o,  f'jA  =n2|A  =j-j,A|  =  o,,  =  I— O] 

P  A  |A'=i'2.X  j  =  /^|  P[A  =^2  |A'=T2,cL|  =  /^2  =  ^~i^i 


,t  be: 


If,  under  a  new  stale  of  iiiforinalioii  (A  ').  A  has  a  new  outcome  (03),  and  the  ignored 
outcome  special  case  is  applicable,  then  the  conditional  probabilities  can  be  found  as  fol¬ 
lows.  First,  the  conditional  probabilities  of  the  new  outcome,  given  the  outcomes  of  A', 
must  be  encoded  from  the  expert.  Say  tln-y  are  determined  to  be: 

F\A  =03  l  A'=J-,,A  'j  =  03 
P  \A  =03  I  V=T2,A-  ')  =  /?3 

After  these  probabilities  have  been  encoded,  the  ’s  are  given  by 

X,  =  1— Q3 

Xo  =  1-/^3 

'I'lien  the  conditional  probabilities  for  the  original  two  outcomes  of  A  are  given  by 

/’|,4  =a,  1A'=3-,,A-'|  =  X,0|  /’|A=a2  |A'=j,,A  ')  =  XjOj 

l’\A=■a^  |A'=3-2'^'  i  =  ^jl^i  /’(A  =02  l-V=J2’‘^'  I  ^2/^2 

W  hen  considering  the  addition  of  k  new  outcomes  (instead  of  just  one),  the  pri¬ 
mary  ilifference  deals  with  the  calculation  of  tlie  X^.  For  each  possible  combination  of 
ou(cf>mes  for  C(A  ),  \j  is  given  by 

\  =  1  -  1:*  (8) 

I  — m  -f  I 

'Fills  means,  for  the  exiianded  variable  A  .  only  the  marginal  or  conditional  probabilities 
fi'r  the  new  outcomes  must  be  encixled  Dnce  these  are  obtained,  a  X^  for  each  coiiibi- 
natirni  of  outcomes  of  F'!/!  )  can  be  computed  directly,  and  the  probat'ilit ies  under  K’ ' 
for  the  original  outcomes  are  given  liy  FAj  (h). 

1  he  reduction  in  the  retjuired  niinib-r  of  encodings  dejiciids  on  the  number  of  old 
(rn)  and  new  (k)  outcomes  for  A,  tin  number  of  conditional  predecessors  for  A 
(IK'(-MII)'  *1'*’  number  of  outcome.s  for  each  predecessor.  F('r  comparative  pur- 


poses,  suppose  that  each  conditional  predecessor  of  A  has  n  outcomes'.  Then  the 
number  of  encodings  needed  to  determine  A 's  distribution  in  the  general  case  is 
(m — 1  )Xn  since  probabilities  for  all  but  one  of  the  T ’s  ni -j-k  outcomes  are 

needed  for  eacli  combination  of  the  outcomes  in  C(A  ).  d'he  probability  of  the  remain¬ 
ing  outcome  of  A  is  determined  by  the  property  that  these  probabilities  must  sum  to 
one.  Similarly,  tlie  number  of  required  probability  assessments  for  the  ignored  outcome 
special  case  is  just  because  only  the  probabilities  for  the  new  outcomes  of  A 

are  needed. 

A  similar  reduction  can  be  found  in  the  number  of  probability  assessments  for 
direct  successors  of  nodes  with  increased  outcome  spaces.  The  applicability  of  the  spe¬ 
cial  case  must  be  assessed  for  each  direct  successor  node  individually.  Referring  to  the 
simple  example  given  above,  suppose  £J  was  a  direct  successor  of  A  .  When  the  proba¬ 
bility  distribution  for  B,  under  the  new  state  of  information  is  given  by 

R(/i=6,  lA=a,,C(D),S:'j  =  Rl//=6^  1/1  =a, ,C(R),&]  «=1 . m  (9) 

where  ti,  is  in  the  set  of  original  outcomes  for  A  ,  then  the  only  conditional  distributions 
which  must  be  assessed  for  B  are  those  which  are  conditioned  on  0^  +  1-  ■  ■  ■  1  l^e 

new  outcomes  of  A  .  Tliis  means  the  original  distributions  for  B ,  given  the  old  outcome 
space  for  A,  are  still  valid  under  the  new  state  of  information.  P'or  example,  say  the 
old  outcome  space  for  A  was  {sunny,  rainy},  and  B  w:is  {get_wct,  not_get_wel}.  If  the 
outcome  s[)ace  for  A  is  expanded  to  {sunny,  rainy,  snowy),  the  expert  must  determine 
whether  or  not  the  old  distribution  for  “getting  wet”  or  not,  given  that  it  is  sunny  (or 

’Thi^  ^uppositinn  only  nia't*  for  nolfiliona!  '•f.nv*  .Ml  rpsuits  rffnain  vali^  yy  h''n  tha  numh'*r  of  out'*om^s 

IS  to  vary  for  farh  conditional  predpc^ssor,  hut  th^  numhfr  of  conihinalions  of  thos**  outcomes  is  eaJculated 

differently. 


rainv),  is  still  valid.  Only  those  conditional  distributions,  associated  with  si)ecific  out¬ 
comes  of  A,  which  the  expert  finds  not  to  be  valid  must  be  reassessed.  At  the  very 
least,  the  conditional  distributions  for  Ji,  given  the  new  outcomes  of  A  ,  must  be 
assessed . 

'I'he  number  of  probability  assessments  required  to  determine  B's  distribution 
depends  on  the  ni ,  k,  the  number  of  outcomes  for  B  (p),  and  the  number  of  outcomes 
for  each  of  the  variables  in  C(B)\A^.  For  the  general  case,  (m  +k  )x(p  -I)Xn 
probability  assessments  are  needed.  I'his  is  reduced  to  /: X(p  — 1  )Xn  when  this 

special  case  aiiplies. 

'I’o  see  these  reductions  more  clearly,  consider  the  metastatic  cancer  example  (Fig¬ 
ure  1)  introduced  in  Chapter  11.  If  the  expert  determines  there  are  k  additional  out¬ 
comes  of  the  BT  node,  then  in  general  the  number  of  probabilities  which  must  be 
encoded  to  obtain  the  new  distribution  for  BT  is  (‘i-l-A’— 1  )X2',  or  2A--f2.  Note  that  this 
is  exponential  in  the  number  of  conditional  predecessors  for  BT  (one,  in  this  case). 
When  this  special  case  applies,  this  number  is  reduced  to  2^^  since  the  two  probabilities 
encoded  under  the  original  state  of  information  do  not  need  to  be  reassessed,  d'he  size 
of  this  reduction  is  e<jual  to  the  amount  of  data  already  encoded  under  the  original 
state  of  information.  Once  the  new  probabilities  are  encoded,  the  approjiriate  's  can 
be  com|>ulcd  using  Eq  (8).  Similarly,  for  the  C  node,  the  number  of  assessments 
required  to  determine  its  distribution  is,  in  general,  (2-f-/: )X(2— 1  )X2,  or  (2"-f2A-).  If  the 
original  conditional  distributions  for  node  C  (which  were  based  on  the  original  set  of 
outcomes  for  Bl  )  are  still  valid,  then  only  the  probabilities  for  C  which  are  conditioned 

K’(B)\A  df  note’s  th?  of  aJi  conditit>nal  prpdp-'fssors  of  /^ .  excluding  A 


on  the  now  outcomes  of  13T  must  be  encoded,  reducing  the  number  of  required  assess¬ 
ments  to  2k.  If  this  special  case  is  applicable  to  node  H  as  well,  then  the  number  of 
assessments  needed  to  determine  its  distribution  is  reduced  from  (2-(-A' )X(2  — 1 )  for  the 
general  case,  to  (2  — 1)X^  for  the  ignored  outcome  sj)ecial  case, 

1  igures  3. a  and  3.b  show  a  graphical  comparison  of  the  relative  number  of  assess¬ 
ments  re(iuired  in  the  general  case  and  the  number  required  in  the  ignored  outcome 
case.  Although  these  graphs  show  effects  for  up  to  15  added  outcomes,  realistically  only 
five  or  fewer  outcomes  would  be  added. 


Relative  Effectiveness  of  Ignored 
Outcome  Special  Case  for  Node  A 


New  Outcomes  (k) 

b.  Relative  Effectiveness  of  Ignored 
Outcome  Special  Case  tor  Node  B 


Figure  3.  Ignored  Outcome  Versus  General  Case  Data  Requirements 
When  Expanding  A  by  k  Outcomes 


For  each  indicated  value  of  in,  the  curves  show  the  ratio  of  the  numlrer  of  assess¬ 
ments  recpiired  for  the  special  case  to  the  number  rerpiired  in  the  general  ca'^e  'I'lus 

k  ..  k, 

ratio  IS - - for  Figure  3. a,  and - for  Figure  3.b.  If  A  originallv  has  two  otit- 

in+k—\  viAk 

comes,  then  gains  another  due  to  a  cliange  in  tfie  stale  of  information,  f  ignre  3  a  shows 
that  the  special  case  rerpiircs  1/2  as  many  profrability  assessments  as  the  general  r  ase  to 
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dolpriiiiiie  the  distribution  for  A,  and  figure  3.b  shows  that  only  1/3  a-s  many  are 
needed  to  determine  the  distribution  for  U . 

Notice  that  as  the  number  of  new  outcomes  increases,  the  relative  effecliveness  of 
the  special  case  decreases:  as  k—*oo  the  special  case  requires  nearly  as  many  probability 
assessments  as  the  general  case  for  both  nodes  A  and  D .  Conversely,  as  m  increases, 
the  relative  effectiveness  of  the  special  case  increases.  Both  of  these  reflect  that  the 
effectiveness  of  the  sjiecial  case  depends  on  the  amount  of  growth  relative  to  the 
amount  of  data  for  the  given  distribution  in  the  original  influence  diagram. 

Sflil  Outcome  Special  Case.  A  similar  special  case  exists  for  situations  where 
an  outcome  of  a  variable,  say  A  ,  is  split  into  two  or  more  distinct  outcomes.  In  effect, 
the  original  outcome,  say  A  =a,,  was  actually  many  outcomes:  a,i,a,2'  •  •  •  >  Cnless 
the  expert  determines  otherwise,  the  probabilities  for  the  unchanged  outcomes  are  still 
valid  under  the  new  state  of  information.  That  is, 

r[A  =a.  |C'(A  ),&')  =  r\A  =a,  |C(/1  ),&)  iVs  (10) 

for  a,  in  the  set  of  unchanged  outcomes  of  A.  The  conditional  probabilities  for  the  new 
outcomes  can  then  be  assessed  directly,  where 

\C(A).K  '\=r[A=a,  |(’(A),X'l  (11) 

i-i 

for  each  combination  of  outcomes  of  the  conditional  predecessors  of  .1  .  Alternatively, 
these  probabilities  may  be  gathered  as  fractions  of  /’ j/1  =o,  |C’(A),Aj.  'I'he  knowledge 
engineer  would  need  to  know,  for  each  combination  of  C(/l ),  the  probability  that 
A  =a„  given  that  A  is  one  of  the  new  outcomes. 

For  either  inetlnHl  of  asse.ssment,  the  sum  of  the  new  conditional  probabilities  is 
kit'ivMi.  thus  reducing  bv  one  the  reiptired  number  of  assessments  for  eaili  ciunbination 


of  f'(.l  )  'I’iiis  (iniiiutp)  savings  is  not  generally  applicable  for  the  ignored  outconie  s[)e- 
cial  case,  since  there  is  no  a  prion  way  to  (leterniine  the  sum  of  the  ’s. 

Just  as  for  the  ignored  outcome  cas>',  direct  successors  of  a  “split  outcome’’  node 
do  not  have  to  have  probabilities  reassessed  which  are  dependent  on  the  unchanged  out¬ 
comes  of  A.  Only  probabilities  conditioned  on  the  new  a„  outcomes  need  b<'  assessed. 
Thus  the  new  distribution  is  given  by 

P\A=a,  |C'(.4),A:']  =  T|/l=a,  |C'(.4),X:-j  .Vs  (12) 

for  the  unchanged  outcomes  of  A  ,  and 

r[A=a„  \Cj{A).k']  =  |CV(.4),i;:)  .  =  1 - k  ,,3, 

j  =  i .  II  ii.vir 

A-  e  C(.4  I 

k 

where  ^X,j  equals  one  for  each  combination  Cj{A  )  of  /I ’s  conditional  predecessors. 

1-1 

Since  A  has  m+k—l  outcomes  under  the  new  stale  of  information,  the  number  of 
assessments  required  to  form  /t 's  distribution  (for  the  general  case)  is 
(m -fA:  — 2)Xti  When  the  split  outcome  case  applies,  this  is  reduced  to 

(A:  — 1  '^Uince  the  sum  of  the  probabilities  for  the  k  new  outcomes  of  A  is  known 

from  E()  (12). 

Just  as  for  the  ignored  outcome  special  case,  any  direct  successor  (//)  of  /I  may 
not  need  all  of  its  probabilities  reassessed.  If  the  the  expert  determines  that  the  original 

conditional  probabilities  for  Zf,  given  outcomes  uj . of  /I,  are  still  valid,  then 

only  the  probabilities  concerning  /I  s  new  outcomes  must  be  assessed.  If  It  has  p  out¬ 
comes,  and  each  variable  in  ('(I})\A  has  u  outcomes,  then  for  tlie  general  case 


(m -fA- — 1  )X(p  — 1  )X«  |B)\'<I!  probability  a.sspssinciils  are  required.  If  the  split  outcome 

case  a])plie.s,  the  number  of  assessments  is  reduced  to  Ax(;>— l)Xn^^  (W)\.4|| 

The  graphs  in  Figures  4. a  and  4.b  show  the  elTectiveness  of  the  split  outcome  sjie- 
cial  case  relative  to  the  general  case.  Similar  to  the  graphs  in  Figure  3.  these  graphs 
show  the  ratio  of  the  required  number  of  assessments  for  the  split  outcome  special  case 

k—\  A  , 

to  those  for  the  general  case:  - — — -  for  Figure  4. a,  and  - ; -  for  Figure  4.b. 

m+A-2  rn+A-l 

Again,  notice  the  same  type  of  effect  from  the  relative  size  of  the  increase  in  A  :  as  A 
becomes  large  relative  to  m,  the  effectiveness  of  the  special  case  decreases.  Although 
the  graphs  show  the  effects  for  A  <15,  the  range  of  interest  b  for  A <5. 


Figure  4.  Split  Outcome  N'ersus  Oeneral  Case  Data  Recpiirements 
VVhen  Splitting  an  Ouliome  of  A  Into  k  Outcomes 


Just  as  the  ignored  outcome  s[)ccial  case,  the  split  outcome  special  case  shows 
exponential  growth  relative  to  the  number  of  conditKiHal  jtredece.s.sors.  For  both  of 


these  si)ecial  cases,  however,  the  required  number  of  probability  assessments  is  some 
fraction  of  tliose  required  in  the  general  case. 

Changes  in  the  Nu7nber  of  V'ariahles.  While  the  previous  two  special  cases  dealt 
with  the  expansion  of  the  outcome  space  for  a  particular  variable,  now  we  examine  the 
addition  of  a  new  variable  into  the  diagram.  While  adding  a  variable  to  the  diagram 
increases  the  number  of  probabilities  in  the  joint  distribution  by  a  factor  equal  to  the 
number  of  outcomes  for  the  new  variable,  the  increased  number  of  additional  probabil¬ 
ity  assessments  required  for  an  influence  diagram  depends  primarily  on  the  conditioning 
changes  brought  about  by  the  new  variable. 

For  example,  consider  the  following  change  to  the  metastatic  cancer  influence 
diagram  shown  previously  in  Figure  1.  A  new  disease,  denoted  by  the  variable  A’  with 
outcomes  (present,  absent},  is  discovered  which  can  cause  brain  tumors.  'J'he  joint  dis¬ 
tribution,  which  had  2^  probabilities  under  the  original  state  of  information,  now  con¬ 
tains  2^  probabilities  (an  increase  of  32).  The  effect  on  the  marginal  and  conditional 
distributions  is  less,  however.  One  probability  must  be  assessed  for  the  new  node,  and 
four  1(2— 1)X2‘|  probabilities  must  be  assessed  for  node  BT,  increasing  the  total  number 
of  probabilities  needed  for  the  influence  diagram  from  eleven  (under  the  old  state  of 
information)  to  fourteen. 

As  indicated  in  this  example,  if  the  variable  A  is  added  to  an  influence  diagram, 
the  number  of  probability  assessments  required  to  define  A ’s  distribution  depends  on 
the  number  of  outeomes  for  A  and  the  number  of  outcomes  for  each  conditional  prede¬ 
cessor  of  A  .  Since  there  was  previously  no  information  in  the  diagram  regarding  A  ,  all 
of  these  probabilities  must  be  asse.ssed.  Additionally,  the  variables  which  now  have  A 
as  a  conditional  predecessor  (i.e,,  the  direct  sueces.sor.s  of  A  )  must  now  have  their  distri- 

3(» 


billions  reassessed.  One  special  case,  the  “assumed  constant  outcome”  case,  was 
identified  that  would  reduce  the  number  of  required  assessments. 


Assumed  Constant  Outcome  Special  Case.  One  way  that  the  number  of  assess¬ 
ments  can  be  reduced  is  if  the  old  state  of  information,  <fc,  is  just  the  new  state  of  infor¬ 
mation  with  the  added  condition  that  A  =0),  'I'his  might  be  the  case  when  an  expert 
learns  that  a  factor  previously  consideretl  constant  did,  in  fact,  have  additional  out¬ 
comes  Part  of  the  probabilities,  for  nodes  which  gain  as  a  conditional  predecessor, 
would  then  transfer  directly  from  the  original  slate  of  information  to  the  new  state  of 
information.  Suppose  C  is  a  node  with  p  outcomes  that  is  conditioned  (under  the  new 
state  of  information,  & ')  on  the  newly  added  ,4.  If  this  special  case  applies,  part  of  the 
new  probability  distribution  for  B  is  given  by 

P\B=^b,\A=ao,C{D),Sc'\  =  J^\D==bj\C(D),&]  _ p  (11) 

If  there  were  only  two  outcomes  of  A  .  then  the  conditional  distributions  of  all  the 
direct  successo.'-s  of  A  would  be  half  complete— a  reduction  of  50'7'  in  the  number  of 
assessments. 

'fills  special  case  provides  no  reduction  in  the  number  of  assessments  required  to 
determine  .4  s  distribution,  because  originally  there  was  no  information  about  A  in  the 
influeiiee  diagram.  If  A  has  k  outcomes,  and  each  of  /I  s  conditional  predece.ssors  has 
fi  outcomes,  then  the  number  of  assessments  for  both  the  general  case  and  the  constant 
outcome  special  case  is  (^— If  the  assumed  constant  outcome  special  case  is 
applicable  to  a  direct  successor  like  B,  above,  then  the  number  of  assessments  reipiired 
to  determine  the  probability  distribution  of  B  decreases.  If  each  conditional  predece.s- 
sor  (other  than  A)  of  B  has  ri  outcomes,  the  number  of  assessments  drojis  from 
A:X(p  — 1  )Xn  for  the  general  case  l<>  (A- —  1  )x(;' —  1  )Xri for  the  special  case. 
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As  seen  for  the  other  special  cases,  tlie  data  requiremeiils  grow  exponentially  as  the 
number  of  conditional  predecessors  increases,  but  the  original  probaliilities  do  not  need 
to  be  reassessed  when  the  assumed  constant  outcome  case  holds. 

In  practice,  this  special  case  is,  perhaps,  the  easiest  to  see  being  aiiplicable.  Recall 
the  KBS  described  earlier  this  chapter,  which  could  be  used  to  monitor  intelligence 
reports  and  draw  conclusions  about  possible  enemy  maneuvers  and  tactics.  Suppose  the 
original  KBS  was  based  on  a  state  of  information  where  the  enemy  clearly  had  the 
upper  hand:  better  technology,  a  greater  number  of  weapon  systems,  and  better  train¬ 
ing,  for  instance.  It  is  not  at  all  unlikely  that  some,  or  all,  of  the  probabilities  of  these 
maneuvers  would  change  if  the  enemy  were  fighting  in  a  situation  where  1)  they  had 
the  upper  hand,  2)  their  opponent  had  the  upper  hand;  or  3)  neither  side  had  the  upper 
hand  (e(iuivalent  capabilities).  Depending  on  the  capabilities  of  the  opposing  force,  cer¬ 
tain  discriminators  may  indicate  different  tactics  or  maneuvers. 

The  graphs  in  Figures  5. a  and  5.b  show  the  relative  number  of  assessments 
required  for  the  assumed  constant  outcome  special  case.  As  in  the  previous  graphs, 
these  grajihs  show  the  ratio  of  the  number  of  assessments  required  for  the  assumed  con¬ 
stant  outcome  special  case  to  the  number  required  in  the  general  case.  Since  this  special 
case  does  not  reduce  the  required  number  of  assessments  to  determine  the  probability 
distribution  of  the  newly  a<lded  variable.  Figure  5. a  shows  that  exactly  the  same 
number  of  probabilities  must  be  assessed  (ratio  =  1).  Figure  5.b,  like  Figures  3  and  4, 
shows  the  decreasing  effectiveness  of  this  special  case  as  k  becomes  larger  for  any  nodes 
which  gain  A  as  a  conditional  predecessor  and  for  which  the  special  case  applies.  The 

ratio  for  this  graph  is  given  by  — — 
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New  Outcomes  (k) 

a  Relative  Effectiveness  of  Assumed  Con¬ 
stant  Outcome  Special  Case  lor  Node  A 


b.  Relative  Effectiveness  of  Assumed  Con¬ 
stant  Outcome  Special  Case  for  Node  B 


Figure  5.  Assumed  Constant  Outcome  \'ersus  Ceneral  Case  Data  Requirements 

VVlien  Adding  New  Variable  A 


Changes  in  Conditioning.  When  an  arc  is  added  between  two  nodes,  say  from  A  to 
B,  only  B  must  have  its  probability  distribution  reassessed.  Since  the  distribution  for 
the  predecessor  node  (A)  is  defined  by  the  conditional  predecessors  of  A  ,  no  changes 
must  be  made  to  A ’s  distribution.  The  effect  on  B ,  however,  is  one  seen  earlier,  in  the 
discussion  concerning  changes  in  the  number  of  variables.  In  fact,  adding  an  arc  can  be 
viewed  as  a  special  case  of  adding  a  new  variable.  Thus  the  assumed  constant  outcome 
special  case  may  also  be  applicable  when  adding  a  new  arc  between  existing  nodes.  For 
B.  the  new  direct  successor  of  A,  the  situation  is  identical  to  that  for  a  succes.sor  of  a 
newly  added  node:  if  the  original  probability  distribution  of  B  is  valid  for  one  outcome 
of  A  ,  those  values  need  not  be  gathered  again. 

The  Importance  of  Conditional  Independence.  'I'he  importance  of  getting  the 
correct  conditioning  relationships  in  the  influence  diagram  can  not  be  overstated.  If 
valid  relationships  are  left  out.  the  domain-specific  knowledge  base  will  be  incomplete. 
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and  the  syslein  may  reach  conclusions  that  differ  greatly  from  the  expert's.  'I'his  would 
eventually  lead  to  a  maintenance  action  to  correct  the  discrepancy,  much  as  rulc-bsused 
systems  are  updated  when  they  reach  incorrect  coticlusions.  If  unnecessary  arcs  are 
included  in  the  diagram,  the  number  of  probabilities  which  must  be  assessed  is  unduly 
increased  'I'he  conclusions  will  be  tlie  same  as  those  reached  using  the  diagram  without 
the  unnecessary  arc,  but  more  steps  (i.e  .  more  computer  resources)  will  be  required  to 
reach  tle'se  conclusions. 

Su  minnry 

Three  special  cases  (ignored  outcome,  split  outcome,  and  assumed  constant  out¬ 
come)  |ii>int  to  situations  where  part  or  all  of  the  marginal  and  conditional  probabilities 
for  nodes  with  changed  incoming  arcs  can  be  used  under  the  new  state  of  information. 
Although  these  special  cases  do  provide  some  decrease  in  the  number  of  probability 
assessments  which  must  be  done  to  complete  the  modified  influence  diagram,  their 
applicability  must  be  determined,  by  the  expert,  for  each  change  that  is  made.  Since 
the  number  of  required  assessments,  even  when  a  special  case  applies,  is  exponential  in 
the  number  of  conditional  predecessors  to  the  node  being  reassessed,  the  selection  of  the 
minimum  essential  conditioning  relationships  is  much  more  important  in  keeping  the 
number  of  assessments  as  lc)w  as  possible. 


IV.  Conclusions  and  Rtcommendalions 


Tlircnighout  the  first  three  cliapters,  various  aspects  of  probabilistic  know  ledge- 
based  systems  have  been  discussed,  beginning  witli  a  discussion  of  the  concerns  which 
have  hampered  the  use  of  probability  in  many  KliS’s.  Existing  efforts  to  alleviate  those 
concerns,  along  with  some  of  the  advantages  of  using  probabilities,  were  presented  in 
Chapters  1  and  11.  Chapter  111  explored  ways  to  reduce  the  amount  of  new  data  encod¬ 
ing  when  modifying  a  probabilistic  knowledge-based  system.  This  analysis  was 
motivated  by  the  perceived  difficulties  in  the  encoding  of  probabilistic  data,  and  the 
difficulty  of  modifying  a  probabilistic  KllS  due  to  complex  interactions  among  the  vari¬ 
ables  Tiie  following  conclusions  are  drawn  from  this  research. 


C'onrlu.‘!wns 

Importance  of  Graphical  Representations.  The  first  conclusion,  evident  from  discus¬ 
sions  in  Chapters  II  and  111,  is  that  the  method  chosen  for  representing  the  probabilistic 
information  is  extremely  important,  both  in  the  use  and  the  maintenance  of  the  KBS. 
Heceiit  research  in  this  area  has  focused  on  the  use  of  directed,  acyclical  graphs,  which 
provide  the  following  advantages. 

1)  lx>cal  modularity,  provided  by  conditional  independence  within  the  graph, 
limits  the  number  of  reas.sessments  which  must  be  done,  only  nodes  which 
have  ex[)erienced  a  change  to  their  outcome  spaces  or  incoming  arcs  (condi- 
tifinul  dependencies)  must  lie  reassessed.  'Phis  reduces  the  growth  from  being 
ex|)i>npntial  in  the  total  numlier  of  variables  in  the  graph  to  being  exponen¬ 
tial  in  the  number  of  conditional  predecessors  for  each  node.  This  reduction 
could  be  quite  significant,  even  for  small  influence  diagrams.  For  instance, 
when  adding  an  arc  from  the  1('  node  to  the  11  node  in  the  metastatic  cancer 
example,  tfiirteen  assessments  are  required  to  define  all  of  the  prior  and  con¬ 
ditional  distributions  in  the  diagram,  but  only  four  new  assessments  arc 
needed  to  define  IPs  new  distribution. 

2)  Probabilities  can  lie  encoded  and  used  in  either  the  causal  or  evidential  direc¬ 
tion,  as  ojiposed  the  evidential  direction  required  for  rule-based  systems. 


3)  Dependencies  among  variables  can  be  represented  easily,  and  more  efTiciently 
than  rule-based  representations 

4)  Shachter’s  algorithm  provides  a  simple  method  for  finding  any  probability 
distributions  of  interest,  conditioned  on  any  set  of  evidences. 

These  advantages  eliminate,  or  at  least  diminish,  many  of  the  concerns  about  using  pro¬ 
babilities  in  KI3S’s. 

Rule-Based  Versus  Graphical  Representalions.  Given  tlie  advantages  of  graphical 
representations,  one  may  think  that  such  a  representation  would  always  be  superior  to 
a  rule-based  representation.  This  is  not  necessarily  the  case,  especially  when  the  depen¬ 
dencies  found  in  the  problem  domain  are  not  complex  and  the  problem  domain  itself  is 
relatively  stable  (not  dynamic).  As  the  problem  domain  becomes  more  dynamic,  how¬ 
ever,  the  ability  to  assess  the  probabilities  in  the  direction  (causal  or  evidential)  most 
convenient  to  the  expert  makes  the  use  of  influence  diagrams  (or  Bayesian  networks) 
more  desirable.  Whether  the  problem  domain  is  dynamic  or  not,  as  the  dependencies  in 
the  knowledge  base  become  more  complex,  the  graphical  representations  again  become 
more  desirable. 

Applicability  of  the  Special  Cases.  Some  reduction  in  the  required  number  of  proba¬ 
bility  as,sc.ssments  may  be  gained  in  situations  where  the  special  cases  (ignored  outcome, 
split  outcome,  or  assumed  constant  outcome),  discussed  in  Chapter  Ill,  aj)ply.  How 
often  they  will  be  applicable  is  unclear  and  will  remain  so  until  probabilistic  KBS's  are 
built,  operated,  and  maintained  over  a  period  of  time.  However,  the  underlying 
assumptions  for  each  special  case  seem  to  be  reasonably  realistic,  so  it  would  not  be 
surprising  to  find  that  they  apply  in  real  decision  problems. 

I'wo  characteristics  of  the  special  cases  may  limit  their  overall  usefulness.  First, 
the  expert  must  determine  their  apjilicabilitv  to  each  change  made  in  the  knowledge 


base:  they  may  not  (and  probably  will  not)  apply  in  many  situations.  Second,  since 
the  quantity  of  data  contained  in  the  original  graph  becomes  smaller  (relative  to  the 
total  amount  in  the  modified  graph)  as  k  increases,  the  effectiveness  of  the  special  cases 
decreases,  requiring  almost  as  many  assessments  as  the  general  case.  Given  the  advan¬ 
tages  already  provided  by  the  graphical  representations,  the  limited  applicability  and 
effectiveness  of  the  special  cases  indicates  that  research  aimed  at  using  the  original  pro¬ 
babilities  is  less  likely  to  yield  significant  results  than  research  in  other  areas. 

Areas  for  Future  Research 

Efficient  Propagation  Techniques  for  Multiply- Connected  Graphs.  One  area  which 
holds  significant  promise  is  the  efficient  and  rapid  propagation  of  probabilities 
throughout  a  multiply  connected  graph.  As  indicated  in  Chapter  11,  it  is  possible  to 
apply  I’earl's  local  propagation  technique,  but  current  methods  suffer  from  exponential 
growth  and  an  inability  to  provide  a  realistic  interpretation  for  “auxiliary  variables” 
added  to  transform  the  graph  into  a  singly  connected  one.  The  question  of  interpreta¬ 
tion  may  become  a  moot  point  if  an  algorithm  can  be  developed  which  can  convert  a 
fully  specified,  multiply  connected  graph  into  a  singly  connected  graph  by  adding  auxili¬ 
ary  variables  only  to  the  internal  portion  of  the  graph,  where  they  are  neither  seen  as 
evidences  nor  of  interest  as  hypotheses. 

The  Effect  of  Different  Uncertainty  Representations.  Another  area  requiring  much 
more  research,  as  indicated  by  llenrion,  is  the  effect  of  using  different  methods  (proba¬ 
bility,  certainty  factors,  fuzzy  set  theory,  etc.)  to  represent  uncertainty  in  KBS's.  Is 
there  really  any  difference  between  conclusions  reached  using  normative  representations 
(i.e.,  probability)  and  those  reached  using  descriptive  representations,  such  as  fuzzy  set 
theory^  If  so,  under  what  conditions  are  the  differences  pronounced,  and  which  method 


gives  the  “best”  answer?  If  not,  which  is  the  easiest  method  to  implement,  or  the  most 
efficient? 

More  Efftcient  Probability  Encoding  Methods.  For  the  maintenance  of  probabilistic 
knowlc(ig('-bascd  systems,  exploring  ways  to  make  the  encoding  of  probabilities  more 
efficient  may  be  the  most  lucrative  area  to  examine.  Results  from  such  a  study  would 
be  e(|ually  applicable  to  artificial  intelligence  and  decision  analysis.  Using  a  knowledge- 
based  svstem  to  help  guide,  or  to  some  extent  automate,  the  encoding  process  is  one 
po.ssible  approach. 

Idcntijicaiion  of  Conditional  Independencies.  As  indicated  in  Chapter  III,  unneces¬ 
sary  dependencies  within  an  influence  diagram  increase  the  amount  of  effort,  both  in  the 
building  (probability  assessments)  and  in  the  operation  (mathematical  computations)  of 
a  probabilistic  KBS.  While  it  is  not  likely  that  such  unnecessary  dependencies  would  be 
included  in  the  original  system,  during  maintenance  efforts  the  expert  and  knowledge 
engineer  may  be  taking  a  more  restricted  view  of  the  system,  focusing  on  a  few  particu¬ 
lar  nodes.  This  may  prevent  them  from  noticing  that  some  of  the  arcs  they  are  adding 
are  unnece.ssary,  given  the  other  dejiendencies  already  in  the  system. 

How  can  such  unnecessary  arcs  be  identified?  The  definition  of  conditional 
independence  may  provide  the  answer.  If  an  arc  from  A  to  Z?  is  unnecessary,  then  the 
probabilities  for  outcomes  of  B  are  given  by 

P\D=hj  \A  =a,,C(Z?)\.4  ,^1  =  P[B=bj  \C{D)\A  (15) 

for  all  outcomes  a,  of  A  ,  and  of  /?.  Using  the  strict  equality  may  be  unreasonable  in 
actual  use,  however.  It  may  be  more  reasonable  to  ask  the  expert  to  determine  if  an  arc 
is  necessary  when  this  equation  is  apprc>\imately  true.  Checking  for  this  a|)proximate 
equality  only  needs  to  be  performed  when  the  probability  distribution  of  a  ncxle  is 
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eitlier  initially  assessed,  or  modified  during  diagram  maintenance.  Such  a  routine  can 
easily  be  added  to  systems  like  ALTERID,  as  a  post-processor  for  the  routine  which 
accepts  the  node  distributions  from  the  knowledge  engineer  or  the  expert.  Further 
research  examining  ways  to  bring  these  previously  unrecognized  conditional  inde])endcn- 
cies,  and  other  unrecognized  implications  of  the  probabilistic  knowledge  base,  to  the 
expert's  attention  may  be  beneficial.  Recognizing  these  implications  may  give  the 
expert  a  better  understanding  of  tlie  dependency  structure,  and  reduce  the  number  of 
recjuired  assessments  for  future  diagram  maintenance.  Also,  the  imi>act  of  incorrectly 
asserting  such  underlying  implications  needs  to  be  examined. 

Summary 

Attempts  to  keep  the  original  probabilities  usable,  in  some  form,  appear  to  be  of 
limited  value.  There  are,  however,  many  other  areas  where  research  may  provide 
interesting  and  possibly  significant  results.  Research  in  these  areas  should  enhance  the 
ability  of  knowledge-based  systems  to  reason  under  uncertain  conditions,  using  norma¬ 
tive  decision  models  as  the  basis  for  recommended  actions,  and  thereby  increasing  the 
usefulne.ss  of  KRS  in  an  uncertain  world. 
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